This report summarises differential gene analysis as performed by the nf-core/differentialabundance pipeline.
A summary of sample metadata is below:
Comparisons were made between sample groups defined using using metadata columns, as described in the following table of contrasts:
Input was a matrix of 14859 genes for 36 samples, reduced to 14830 genes after filtering for low abundance.
The following plots show the abundance value distributions of input matrices. A log2 transformation is applied where not already performed.
Whiskers in the above boxplots show 1.5 times the inter-quartile range.
Principal components analysis was conducted based on the 500 most variable genes. Each component was annotated with its percent contribution to variance.
For the variance stabilised matrix, an ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).
The resulting p values are illustrated below.
The variable ‘condition’ shows an association with PC6 (4.1%) (p = 0.08). The variable ‘MB’ shows an association with PC1 (35.4%) (p = 0.09). The variable ‘media’ shows an association with PC1 (35.4%) (p = 0.00). The variable ‘day’ shows an association with PC4 (6.7%) (p = 0.01). The variable ‘replicate’ shows an association with PC4 (6.7%) (p = 0.02).
A hierarchical clustering of genes was undertaken based on the top
500 most variable genes. Distances between genes were estimated based on
spearman correlation, which were then used to produce a clustering via
the ward.D2 method with hclust() in R.
Outlier detection based on median absolute deviation was undertaken, the outlier scoring is plotted below.
3 possible outliers were detected in groups defined by media: X3270R3D3, WT701R4D4, X32G0R1D4
2 possible outliers were detected in groups defined by MB: X3270R3D3, X32G0R1D4
2 possible outliers were detected in groups defined by day: X3270R3D3, WT701R4D4
1 possible outliers were detected in groups defined by condition: WT701R4D4
3 possible outliers were detected in groups defined by replicate: X32G0R1D4, X3270R3D3, WT701R4D4
Filtering was carried out by selecting genes with an abundance of at least 1 in at least 1 samples.
Note: For a more detailed accounting of the software and commands used (including containers), consult the execution report produced as part of the ‘pipeline info’ for this workflow.
Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Jonathan R Manning (2022). Shiny apps for NGS etc based on reusable components created using Shiny modules. Computer software. Vers. 1.5.3. Jonathan Manning, Dec. 2022. Web.
Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.
H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
C. Sievert (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.
Trevor L Davis (2018). optparse: Command Line Option Parser.
Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes.
Morgan M, Obenchain V, Hester J and Pagès H (2020). SummarizedExperiment: SummarizedExperiment container.
JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2022). rmarkdown: Dynamic Documents for R.
Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.